Blocking and Filtering Techniques for Entity Resolution

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Analysis of Approximate Blocking Techniques for Entity Resolution

Entity Resolution is a core task for merging data collections. Due to its quadratic complexity, it typically scales to large volumes of data through blocking: similar entities are clustered into blocks and pair-wise comparisons are executed only between co-occurring entities, at the cost of some missed matches. There are numerous blocking methods, and the aim of this work is to offer a comprehe...

متن کامل

Disinformation Techniques for Entity Resolution

We study the problem of disinformation. We assume that an “agent” has some sensitive information that the “adversary” is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent’s goal is to disseminate false information to “dilute” what is known ...

متن کامل

MFIBlocks: An effective blocking algorithm for entity resolution

Entity resolution is the process of discovering groups of tuples that correspond to the same real-world entity. Blocking algorithms separate tuples into blocks that are likely to contain matching pairs. Tuning is a major challenge in the blocking process and in particular, high expertise is needed in contemporary blocking algorithms to construct a blocking key, based on which tuples are assigne...

متن کامل

Human-Powered Blocking in Entity Resolution: A Feasibility Study

Entity Resolution (ER) is the problem of matching the records that refer to the same entity within or across two or more data sources. In recent years, human-powered ER solutions have been proposed so that challenging ER tasks, that machines cannot do well, can be helped by human workers. While successful in achieving high matching accuracy, existing human-powered ER methods did not incorporate...

متن کامل

An Ensemble Blocking Scheme for Entity Resolution of Large and Sparse Datasets

Entity Resolution, also called record linkage or deduplication, refers to the process of identifying and merging duplicate versions of the same entity into a unified representation. The standard practice is to use a Rule based or Machine Learning based model that compares entity pairs and assigns a score to represent the pairs’ Match/Non-Match status. However, performing an exhaustive pair-wise...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Computing Surveys

سال: 2020

ISSN: 0360-0300,1557-7341

DOI: 10.1145/3377455